[codex] Optimize Vesuvius training runtime and validation#802
[codex] Optimize Vesuvius training runtime and validation#802giorgioangel wants to merge 3 commits intomerge-ink-pipelinesfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e9f706a32f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Deterministic skeleton targets are generated in the dataset before augmentation. | ||
| return None |
There was a problem hiding this comment.
Restore validation skeleton transform for non-Zarr datasets
create_validation_transforms now always returns None, but MutexAffinityDataset._initialize_transforms still invokes this helper when skeleton losses are configured. In that path, validation no longer produces *_skel tensors, and BaseTrainer._compute_loss_value will call skeleton losses without the required skel argument, which raises for DC_SkelREC_and_CE_loss/SoftSkeletonRecallLoss. This breaks mutex-affinity validation whenever skeleton-supervised losses are enabled.
Useful? React with 👍 / 👎.
| cache_key = self._cache_key(patch_info, target_key, ignore_value) | ||
| cached = self._cache_get(cache_key) | ||
| if cached is not None: |
There was a problem hiding this comment.
Disable skeleton cache for augmented training samples
The new cache key is based only on static patch_info (volume/position/patch size), but this transform is appended after stochastic augmentations in the training pipeline. That means repeated patches can reuse a cached skeleton computed for a different augmented variant, so {target}_skel can diverge from the current target tensor and silently corrupt skeleton-supervised training on ZarrDataset runs.
Useful? React with 👍 / 👎.
What changed
vesuviusstack only/ephemeraledtfast path forsurfacedilation while keeping scipy fallback semantics for non-binary casesWhy
ps128throughput gains frombatch_size: 24Validation
Ran on the remote repo under
/home/ubuntu/villa/vesuvius:.venv/bin/python3 -m py_compileon the touched Python filesPYTHONPATH=src .venv/bin/pytest tests/models/test_validation_preview.py tests/models/test_zarr_dataset_dilation.pyps128andps256/proc/<pid>/statusNotes
.patches_cache/,_codex_backup_20260331/,bench_edt_vs_scipy.py, and the rootps128_medial_default.yaml/ps256_medial_default.yamlmerge-ink-pipelinesbecause the remote source branch is based on that branch rather thanmain